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Abstract 

Recurrence plots provide a graphical representation of the recurrent patterns in a time- 
series, the quantification of which is a relatively new field. Here we derive analytical expres- 
sions which relate the values of key statistics, notably determinism and entropy of line length 
distribution, to the correlation sum as a function of embedding dimension. These expressions 
are obtained by deriving the transformation which generates an embedded recurrence plot 
from an unembedded plot. A single unembedded recurrence plot thus provides the statistics 
of all possible embedded recurrence plots. If the correlation sum scales exponentially with 
embedding dimension, we show that these statistics are determined entirely by the exponent 
of the exponential. This explains the results of Iwanski and Bradley (Chaos 8 [1998] 861-871) 
who found that certain recurrence plot statistics are apparently invariant to embedding di- 
mension for certain low-dimensional systems. We also examine the relationship between the 
mutual information content of two timeseries and the common recurrent structure seen in 
their recurrence plots. This allows time-localized contributions to mutual information to be 
visualized. This technique is demonstrated using geomagnetic index data; we show that the 
AU and AL geomagnetic indices share half their information, and find the timescale on which 
mutual features appear. 



1 Introduction 



Patterns are ubiquitous in nature, where their presence may imply inherent predictability. As a 
result there is great interest in developing methods for detecting and quantifying patterns, leading 
to quantitative measures of structure, similarity, information content, and predictability. Here we 
consider recurrence plots, which offer a means to quantify the pattern within a timcscrics, and 
also the pattern shared between two timeseries. 

Recurrence plots are a method for visualizing recurrent patterns within a timeseries or sequence. 
They were first proposed in 1981 by Maizel and Lenk £Q as a method of visualizing patterns in 
sequences of genetic nucleotides. They have since been introduced into the study of dynamical 
systems , where much effort has been put into building quantification schemes for the plots and 
for the patterns within them. There are now many quantitative recurrence plot measures available 
0E]- These have been applied with success to patterns as diverse as music 0, climate variation 
heart rate variability [7], webpage usage [8], video recognition [§], and the patterns in written 
text and computer code |10j . 

In outline, a data series S can be considered as a set of n scalar measurements 

S = {si,s 2 ,s 3 ,... , s n } (1) 

from which a sequence of N (i-dimensional vectors a. k can be constructed using a procedure known 
as time-delay embedding. The vectors arc defined as 

»fe = {Sfc, S k+T , Sfc+2-r, Sfc+(d-l)r} (2) 

where r is a delay parameter and d is known as the embedding dimension, jll) : these parameters 
are typically chosen independently of the recurrence plot technique, for example see |12j . A 
recurrence plot is constructed by considering whether a given pair of these coordinates are nearby 
in the embedding space. Typically, the maximum norm is used, 

||a 4 -aj =ma,x{\s i+k - s j+k \} (3) 

k 

so that the distance between two coordinates equals the maximum distance in any dimension. A 
recurrence plot is represented by a tensor Tf- whose elements correspond to the distance between 
each of the N 2 possible pairs of coordinates a^ , a, [2] : 

T t j = Q(e-\\ ai - aj \\) (4) 

where is a step function (0 for negative arguments, 1 for positive arguments). For each pair 
of coordinates in the series whose separation is less than the threshold parameter e, TSy takes the 
value unity, which can be plotted as a black dot on an otherwise white graph. 

A recurrence plot of independent and identically distributed (IID) data appears as a random 
scattering of black dots, while a regularly repeating signal (such as a sine wave, e.g. see Fig. 1 
of ^21) appears as a series of equally spaced, 45° diagonal black lines. An irregularly repeating 
signal (such as the output of a chaotic system) typically appears as a pattern of small diagonal 
lines of varying length. Paling of the plot away from the main diagonal indicates that the longer 
one observes no repeat of a particular feature, the less likely a repeat is to occur. In this case 
it follows that probability depends on time, and therefore that the process which generated such 
data is non-stationary. 

In this paper we investigate the statistics of recurrence plots, and their meaning in relation to well 
understood statistics from nonlinear timeseries analysis. First, we examine the meaning of two of 
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the key statistics in recurrence quantification analysis (RQ A) , namely the determinism and the en- 
tropy of line length distribution 3 , and the effect on them of the time-delay embedding procedure 
|141 ITT] . Iwanski and Bradley found that the appearance and statistics of recurrence plots for 
certain low-dimensional systems are not significantly altered by a small change in the embedding 
dimension d, suggesting that these statistics may be important new invariant characteristics of a 
system. However, unlike traditional measures where invariance relies on the embedding dimen- 
sion being sufficiently high, Iwanski and Bradley found the same statistics for an unembedded 
recurrence plot as for an embedded version. This was further examined by Gao and Cai |15| . who 
suggested that many recurrence plot statistics may rely on information from a higher embedding 
dimension than was used to construct the recurrence plot. However this does not completely ex- 
plain why these quantities appear to be invariant with respect to the embedding dimension; nor 
whether these quantities are independent of each other, or of other better known measures. This 
is important, since independent quantities potentially yield new information about a system. In 
section |2 we show that all embedded recurrence plots are present within the unembedded plot, 
accessible via a simple transformation. Using this transformation, we derive in section [3] the effect 
of embedding on two RQA statistics: determinism, and entropy of line length distribution. For the 
case of exponential scaling of the correlation sum [see Eq.Q below] with embedding dimension, 
which might be expected for certain low-dimensional systems, we derive expressions which relate 
these quantities to the Kolmogorov entropy rate |14|. This is important for two reasons. First, it 
provides a new perspective on the physical meaning of these quantities. Second, it can be used 
to establish baseline values for independent and identically distributed (IID) processes, above or 
below which a measurement can be said to be significant. 

In section we examine the converse question of how well-known statistics from nonlinear time- 
series analyis relate to recurrence plots. We demonstrate that a standard algorithm for computing 
the mutual information between two timeseries is related to counting the number of black dots 
common to the recurrence plots of the two timeseries in question. This suggests the definition 
of a new form of cross recurrence plot which, when drawn, allows contributions to the mutual 
information to be visualized. We apply this technique to a physical system in which issues of 
predictability and correlation are of practical interest. Earth's geomagnetic activity is monitored 
by a non-uniformly distributed circumpolar ring of magnetometers, which measure fluctuations in 
horizontal magnetic field strength due to enhancements in auroral activity. These measurements 
are compiled to form the AE geomagnetic indices ^Sj > of which we consider AU (a proxy for the 
maximum eastward flowing polar current) and AL (a proxy for the maximum westward flowing 
current). In common with many other "real world" timeseries, these timeseries show both low and 
high dimensional behavior, in this case well defined features on timescales of days (storms) which 
are embedded in colored noise |17| . 



2 Effect of Embedding Dimension 

We now derive a transformation which generates an embedded recurrence plot from an unembed- 
ded recurrence plot. This result is central to the subsequent discussion of the effect of embedding 
on statistics derived from recurrence plots. A single recurrence on an umbedded d = 1 plot is 
represented by a single black dot, corresponding to a pair of data points closer together than e. 
If we consider Fig. ^ (left) to represent part of a d = 1 recurrence plot, the example illustrated 
relates to points numbered 2 and 8, i.e. 



Figure H (right) shows a line of length two. Still taking d = 1, the situation represented is 



\a 2 - 08 1 < e 



(5) 



\a-2 — a-s\ < e and |a 3 — a 9 \ < e 



(6) 
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Figure 1: Representation of diagonal lines of length one and two on a recurrence plot, corresponding 
to pairs of points in the original timeseries. 



Consider forming coordinates in a d = 2, t = 1 embedding space [see Eq. (J5J)]. If we consider Fig. 
n (left) to represent a region of a d = 2 recurrence plot, the black dot now represents 



a 2 -ag < e 



(7) 



where a„ now denotes {a n , a n +i}- Using the maximum norm, Eq. |J5J, this is equivalent to Eq. (JSJ). 
Therefore a single dot in d = 2 represents a line of length two in d = 1. The transformation from 
d = 1 to d = 2 thus reduces the length of all diagonal lines by one dot. An isolated dot is removed 
entirely. 



Formally, we represent the transformation to arbitrary dimension as 

Ty(d) = Ty(l) x T i+T , i+T (l) X T i+2T , j+2T (l) x ... x T i+(d _ 1)T , i+((2 _ 1)T (l) (8) 

An element on the recurrence plot with embedding dimension d is thus related to a diagonal 
sequence of d elements on the unembedded recurrence plot Ijj(l). This transformation enables 
the conversion of an unembedded recurrence plot into any embedded recurrence plot with any 
values of d or r. This suggests that embedding in the construction of recurrence plots is not 
strictly necessary, since all of the information is contained within the unembedded plot Tjj(l); 
let us refer to this as the parent plot. Rather than performing embedding, information can be 
extracted directly from this parent plot. Understanding how the information is contained in 
the parent plot assists in consideration of how various recurrence plot statistics are affected by 
embedding. 



3 Meaning of Recurrence Plot Statistics 

Given the transformation derived above, we now consider two of the key statistics of recurrence 
plots, namely the determinism and the entropy of the diagonal line length distribution. We show 
that both these statistics are related to the correlation sum, and also relate them to the probability 
distribution of line lengths on an unembedded plot. In the case of exponential scaling of the 
correlation sum with embedding dimension, we show that they do not depend on the embedding 
dimension d. 

Recurrence quantification analysis (RQA) provides a set of statistical measures which have been 
proposed to quantify patterns based on the lines and dots visible on a recurrence plot The 
fraction of the plot colored black is the most fundamental statistic associated with recurrence 
plots. This is known as the recurrence rate in RQA, and is known elsewhere as the correlation 
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sum Cd{e) (141 I18|. Cd(e) is the fraction of pairs of coordinates closer together than e, and is 
defined by 

N N 

^W = W3T)E E e(c -N-aj) (9) 

' i=l .7=1+1 

A recurrence plot can be considered to be a two-dimensional pictorial representation of the points 
that contribute to Eq. © for a particular value of e. 

The remaining statistics in RQA are the fraction of black dots involved in diagonal lines, known 
as the determinism Dd, the entropy of the diagonal line length distribution Ed, the ratio of 
determinism to correlation sum, and the slope of the line of best fit on a graph of recurrence 
probability versus distance from main diagonal, known as the trend Except for the trend, these 
statistics can be related to the probability distribution of diagonal line lengths Pd(L), which is 
the probability of observing a diagonal black line of length L beginning from a randomly selected 
element of the recurrence plot. From Eq. ||SJ, the distribution of line lengths on an embedded 
recurrence plot is related to the distribution on an unembedded plot by 

P d (L) = P 1 {L + d-l) (10) 

Hence any statistic formed from the embedded Pd{L) can be constructed from the unembedded 
P\{L + d — 1). For example, using Eq. I|10|) . the correlation sum can be written as 

oo 
L=d 

This relationship can be reversed to give 

P 1 (L) = C L+2 -2C L+1 +C L (12) 

Hence any statistics derived from P\ (L) can also be derived from the correlation sum, as we now 
explicitly show. 

First we consider the determinism D d 0, which was observed to be invariant to embedding 
dimension by Iwanski and Bradley J2| ■ This is the ratio of black dots included in lines of length 
greater than unity to the total number of black dots. The determinism Dd quantifies the prevalence 
of lines, and is believed to quantify how deterministic a system is. This can be related to the 
probability Cd of observing a black dot in a randomly selected location, and to the probability of 
observing an isolated black dot. The number of black dots included in lines is equal to the total 
number of black dots minus the number of isolated black dots (lines of length unity), so we can 
write 

Dd = ^i-m ,i 3 ) 

Using Eq. (jT2"|) to express P\{d) in Eq. lpH!|) . we have 

D d = 2Cd+1 r ; Cd+2 (w) 

Thus the determinism at embedding dimension d can be inferred from knowledge of the correlation 
sum at nearby embedding dimensions d, d + 1 and d + 2. 

The next statistic in the RQA is the Shannon entropy of the line length distribution 3 . This is 
defined as 

oo 

E d = - J2Qd(L) In Qd(L) (15) 

L=l 
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where Qd{L) is the probability of observing a line of length L given the fact that a line is observed. 
This can be related to the probability Pd(L) of observing a line of length L, and the probability 
of observing a line of arbitrary length. Using Eqs. (|10|l and (|12f) we obtain 

Qd(L) = ^ ~ (16) 

w — W+l 

Hence, like the determinism, the Shannon entropy of line length distribution can be obtained from 
the correlation sum. 



3.1 Exponential scaling of correlation sum 



Suppose we assume that the correlation sum Cd can be expressed as an inverse exponential function 
of d with exponent K^- This is strictly true for data derived from an IID process, and is observed 
for many low-dimensional chaotic processes under certain conditions |14j ; in this case Ki is known 
as the Kolmogorov entropy rate. It has been previously shown that this can be extracted from 
the distribution of recurrence plot diagonal line lengths Pd(L) 0|. We write the correlation sum 
as 

C d = Ae~ K * d (17) 

where we have absorbed the dependence of Cd on the threshold parameter e into the constant A. 
Substitution of Eq. (TJJ into Eq. (JT2J yields 

P 1 {L) = A(l-e- K2 ) 2 e- K2L (18) 

This implies that P\{L) is an exponential function of L with the same exponent K2 that governs 
the dependence of Cd on d. This result has been derived independently by an alternative route 
which considers the divergence of trajectories directly |15| . 

From Eqs. (|13fl and l|18|) . the determinism Dd can be written 

Ae -K 2 dr 1 _ £ -K 2 - ) 2 

D * = 1 J~KT d ~ ( 19 ) 

This simplifies to give 

D d = l-~f 2 (20) 

where we define 7 = (1 — e~ K2 ). For exponential scaling of Cd, the determinism is a constant 
independent of the embedding dimension d chosen, and is determined by the exponential scaling 
exponent. Where the correlation sum only exhibits exponential scaling over a limited range of 
embedding dimensions (such as might be expected for a low-dimensional chaotic process), this 
expression remains true, since Eq. (|14|l only relies on knowledge of adjacent (in d) correlation 
sums. 

To derive the Shannon entropy of line length distribution, Eq. Ijl5|l , we insert Eq. (|17|l into Eq. I|16|) 
to give 

Qd(L) = (1 - e -*2) e -tf2(L-i) (21) 
which when inserted into Eq. (|15f) gives 

E d = K 2 (~-lj -ln-y (22) 

As with Dd, this is independent of the embedding dimension d. However, unlike Eq. (12011 this 
expression is only true in the case of perfect exponential scaling. 
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Embedding Dimension d 



Figure 2: Correlation sum d computed as a function of embedding dimension d for 10 5 samples 
of the logistic map with e = 0.1. Applying Eq. (|17() to the measured straight line slope gives 
K 2 = 0.6349 ± 0.0004. 
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Figure 3: Determinism Dd computed as a function of embedding dimension d for 10 5 samples of 
the logistic map with e = 0.1, shown as asterisks. Solid line shows theoretical prediction of 0.7791 
obtained from Eq. I|20l) using the measured value of K 2 from Fig. [21 
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Figure 4: Shannon entropy of line probability distribution Ed computed as a function of embedding 
dimension d for 10 5 samples of the logistic map with e = 0.1, shown as asterisks. Solid line shows 
theoretical prediction of 1.4709 obtained from Eq. using the measured value of K 2 from Fig.|2 

As a demonstration of these results, Fig. shows the correlation sum computed as a function of 
embedding dimension for the logistic map, Xt+\ = fix t (l — Xt), in the chaotic regime with \i = 4. 
This shows reasonable scaling of the correlation sum with dimension, as in Eq. (|17f) . and yields 
K 2 = 0.6349 ± 0.0004. By Eq. JU, this implies a value for the determinism D d of 0.7791 ± 0.0002 
and by Eq. a value for E d of 1.4709 ± 0.0006. These values are shown on Figs. and 01 as 
the solid lines, while the actual values computed from recurrence plots of the data are shown as 
asterisks. Until statistical noise becomes important (around d = 25-30), the points lie convincingly 
on the lines. 



An initial exponential distribution of diagonal line lengths remains exponential after embedding, 
explaining the apparent invariance with respect to d of these statistics for low-dimensional chaotic 
systems )1H) . The determinism Dd and the entropy Ed are in this case governed by the exponential 
scaling exponent of the correlation sum, K%- 

A corollary is provided by the results of Zbilut et al. > who applied the techniques of recurrence 
quantification analysis to short sequences of random integers, as well as to the logistic map. There 
were three sequences considered: (a) consecutive digits of 7r; (b) pseudo-random integers generated 
with MATLAB; (c) experimentally derived random integers, produced by tuning a radio antenna to 
an empty part of the spectrum [20] - AH three were considered with sequence lengths of TV = 1000, 
3000 and 5000, and only exact matches were considered to constitute recurrences. This corresponds 
to e = in Eq. which is only possible when working with integer sequences; for real- valued 
sequences, e is limited by numerical precision. It was found that for (a) and (b) the determinism 
was slightly below 20%, and was defined up to do = 4 for N = 1000, do = 5 for N — 3000 and 
do = 6 for N = 5000. However, (c) had a determinism only slightly above 0%, which was defined 
only up to do = 2 regardless of N. The authors suggested that this was possibly due to some 
innate randomness that sequence (c) possessed, and suggested the RQA as a test to distinguish 
between physical and pseudo random numbers. 

For data drawn from an IID process, the probability of a particular dot being black on the d = 1 
plot is a constant Ci, the correlation sum. Referring back to the definition of Eq. H17jl. we can 
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write 

7 = 1 - Ci (23) 
D d = C 1 (2-C 1 ) (24) 

E d = ^-inC 1 -]n{l-C 1 ) (25) 

These quantities are finite for IID data because a small number of lines are created by chance. 
To conclude that observed data are non-random, the values measured must be compared with 
Eqs. (|24|l and 1251) to establish the significance of the result. 

Integer sequences can be represented as a string of symbols from an alphabet of size m. The 
probability of observing a black dot in a randomly selected location on the unembedded d = 1 
plot is given by 

m 

Ci = £p? (26) 
i=i 

where pt is the probability of observing symbol i from the alphabet. The sequences (a), (b) and 
(c) were all uniformly distributed so we can write p, as 

Pi = - (27) 

777 

and 

^i=EA = 1 (28) 

* — ' 777 z 777 
i=l 



The measured determinism values |19| died out above a particular value of d, when no diagonal 
black lines were seen on a finite recurrence plot. To estimate the embedding dimension at which 
this should occur, we examine the expected number of lines (n) on an embedded recurrence 
plot. This is given by the total number of elements on the plot multiplied by the probability of 
observing, in a randomly selected location, one white dot diagonally followed by d + 1 black dots 
on the unembedded recurrence plot: 

(n)= l -N{N-l){l-C l )C d 1 +1 (29) 

Setting (n) equal to unity gives an estimate for do, the dimension where the determinism should 
die out: 

log 2 - logNjN ~ 1) - logCi(l - Ci) 
d ° fog^ (30) 



Using Eqs. iJUJl and (J30JI with 777 = 10 symbols we obtain d = 0.1 and D d = 19%. From Eq. i|3U|). 
this should be measurable a priori up to do ~ 4.6 for N = 1000, d « 5.6 for N — 3000 and 
d « 6.1 for TV = 5000, see Fig. |S] Comparing these values with the measured results Q^Ij we 
infer that (a) and (b) behave exactly as would be expected for an IID process with no additional 
distinguishing properties. 

To explain the results for the experimentally derived random integers (c), we consider sequences 
of random integers from the same source |20| . The sequences supplied default to the range 1 to 
100, an alphabet of 777 = 100 symbols. For this value of 777 we obtain C\ = 0.01 and from Eq. I|24|l 
we predict a value of determinism D4 = 1.99%. This should persist up to d « 1.8 for = 1000, 
do ~ 2.3 for N = 3000 and d w 2.5 for N = 5000; this information is summarized in Fig. |SJ 
This agrees with the result reported in ^U], so that there is no reason to infer any additional 
randomness property for (c); the results of recurrence quantification analysis can be explained as 
a consequence of the different number of symbols in the sequence. 
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Figure 5: Observed and predicted [from Eq. (J30J)] embedding dimension do at which determinism 
Dd drops to zero, as a result of finite sample size N, for sequences of symbols from an alphabet 
of size to = 10 symbols (columns 2 and 3) and to = 100 symbols (columns 4 and 5). Observed 
values from |19| : for to = 10, consecutive digits of tt and pseudo-random integers generated with 
MATLAB; for to = 100, experimentally derived random integer sequence from www.random.org. 



4 Mutual Information 



A recurrence plot can be considered as a visualization of the double summation in the definition of 
the correlation sum, Eq. JjJJ. It is therefore reasonable to expect that a proportion of the statistics 
derived from recurrence plots would be related to Cd- Conversely, it would also be reasonable to 
expect that existing statistics related to Cd could be derivable from recurrence plots. A recurrence 
plot would then provide a visualization of any such statistic. As an example we consider the mutual 
information, which is a nonlinear measure of correlation between two (or more) discrete timeseries. 
The mutual information I AB between timeseries A and B is defined by 

I AB = H(A) + H(B) - H(A,B) (31) 

where if (A) is the entropy measured for timeseries A and if (A, B) is the joint entropy, measured 
from a joint histogram. For a discrete timeseries, the Shannon entropy is defined by |21| 

H = - ^2 Pi log 2 Pi (32) 

i 

where pi is again the probability of observing symbol i and the summation is taken over all i. 



There are two standard algorithms for computing the entropy H. The first, J^Ji discretizes the 
data using a hierarchy of partitions which become finer in regions of the joint histogram that 
contain more points. The second approach, |22), uses the second Renyi entropy which can 
be approximated by the logarithm of the correlation sum. Hence we can write the second Renyi 
mutual information as 

7 2 AS = log 2 C AB - log 2 C A - log 2 C B (33) 

where C AB is the joint correlation sum, which is the recurrence rate of the following type of cross 
recurrence plot 

T AB = T A T B (34) 

This definition of a cross recurrence plot differs from the standard definition [21], but has been 
recently proposed by Romano et al. |25j as a visualization of recurrent structure common to two 
timeseries. Thus we can obtain a standard mutual information estimate from three recurrence 
plots: T A , T B and T AB . 

The mutual information depends on the values of C A and C B , which in turn are conditioned by 
e A and e B , the threshold parameters used to produce the two auto recurrence plots. These two 
parameters must be chosen in some fashion, and this choice must be justified. One solution is to 
choose the thresholds such that the resulting auto recurrence plots have the same correlation sum. 
That is 

C A (e A ) =C B (e B ) = C a (35) 
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Figure 6: Days 1 to 14 of the AU and AL timeseries for the year 1995. AU, being the maximum 
reading from a network of magnetometer stations, is mostly positive, while AL is mostly negative. 

This choice can be simplified by defining an unthrcsholded recurrence plot in terms of the measured 
correlation sum of the timeseries 



(36) 



This recurrence plot has the property that if it is thresholded, then the resulting thresholded 
plot will have a recurrence rate (correlation sum) equal to the thresholding parameter. The 
corresponding unthresholded cross recurrence plot will now be given by 



U i j B =mzx{U l j,U*} 



(37) 



since the definition of a thresholded recurrence plot uses the maximum norm Eq. © • This allows 
us to write the joint correlation sum as a function of the elements of the joint recurrence plot 
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(38) 



i=l j=i+l 



Thus the joint correlation sum is equal to the recurrence rate of the unthresholded joint recurrence 
plot after it has been thresholded with a threshold parameter equal to Cq. Following Eq. Q33JI we 
then write the mutual information as 



I AB (C Q ) = log 2 C A »(C ) - 21og 2 Co 



AB, 



(39) 



To demonstrate the quantitative practical use of this technique, we now apply it to the geomagnetic 
AU and AL timeseries for the year 1995. AU reflects the activity of eastward flowing polar currents, 
induced in the atmosphere by activity deeper in earth's magnetosphere. AL reflects the activity 
of westward currents, and is typically negative. Figure shows these timeseries for the first two 
weeks of 1995. AU and AL typically come from opposite sides of the polar current system; they 
are therefore expected to share a certain amount of information due to large scale phenomena 
(storms) which are seen in both AU and AL, but to have differences due to smaller fluctuations 
arising from local phenomena. We use data for the entire year in order to get good statistics. 
Statistical noise acts to decrease the measured mutual information. The variance, due to noise, of 
mutual information measurements has been shown to scale with 1/N |26| . where N is the number 
of data points and here we have N = 5 x 10 5 . 
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Recurrence Rate C(e) Threshold parameter E (nT) 

Figure 7: Mutual information I for AU and AL geomagnetic timeseries, normalized to entropy 
of AU and AL separately. Left: as a function of correlation sum Co, see Eq. 1(391) : Right: as a 
function of the recurrence threshold parameter e necessary to create the corresponding underlying 
thresholded recurrence plots for each measurement. 

Within the AU and AL timeseries, three distinct classes of behavior are recognized phenomeno- 
logically: quiet time, storms and substorms. During quiet time, measurements of the order of a 
few nT to a few tens of nT are seen. The other extreme is seen during a magnetic storm, with 
measurements of hundreds of nT persisting for times of the order of several days. These events 
correlate strongly with features on the Sun facing the Earth and thus tend to recur on a 27-28 
day timescale (the synodic rotation period of the Sun). The intermediate event is a substorm |28| . 
during which variations on the scale of tens to hundreds of nT persist for a few hours. Substorms 
are believed to result from the sudden release of stored energy built up in the magnetotail by the 
solar wind. 

Figure [7| shows on the left the functional form of /(Co), the mutual information as a function of 
correlation sum of the underlying recurrence plots, obtained for the AU and AL timeseries. On 
the right are I(e A ) and I(e B ), the mutual information as a function of the underlying threshold 
parameters, constructed using Eq. 1)35(1 . Both figures are normalized to the entropy of AU and 
AL considered individually. The maximum fractional mutual information measured is 50% and 
corresponds to an underlying correlation sum of Co = 0.52. To obtain this value of Co, the two 
underlying thresholded recurrence plots require thresholds of e = 49nT for AU and e = 103nT for 
AL. On the right of Fig. 0is the functional form of /(e), the mutual information as a function 
of the thresholds applied to the two underlying thresholded recurrence plots, again normalized to 
the entropy of AU and AL. The solid line shows the relative mutual information as a function of 
the threshold applied to the AU recurrence plot, while the dotted line shows the same for AL. 

Figures |H1 and El show unthresholded recurrence plots, as defined by Eq. 1)36(1 for the AU and AL 
geomagnetic timeseries respectively. The cross recurrence plot formed from these using Eq. ((37(1 is 
shown in Fig. These plots show which positions in the original timeseries contribute the most 
to the mutual information - in this case the dark areas on the cross recurrence plot correspond 
to the gaps between magnetic storms. We conclude that the mutual information being measured 
between AU and AL results from magnetic storms appearing in both timeseries. 
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Figure 8: Unthresholded recurrence plot of geomagnetic AU timeseries. 
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Figure 10: Unthresholded cross recurrence plot formed from those shown in Figs. |Sland[5] 



5 Conclusions 



Recurrence plots are extremely versatile: they analyse a stream of data by comparing segments 
of it to other segments taken at earlier and later times. The data stream itself is thus used as an 
analysis tool, without any assumptions about the nature of the process that produced it. There are 
many statistical measures associated with recurrence plots, some of which are unique to recurrence 
plot analysis. Here we have described two of the most common statistics, and have demonstrated 
that they are related to better known measures from nonlinear timeseries analysis. In the case 
of exponential scaling of the correlation sum with embedding dimension, the determinism and 
entropy of line length distribution have been shown to be determined by Ki- This explains the 
results of [T3 ] and fT9 ] . 

We have also shown that all recurrence plots are contained within a single parent plot which 
contains all of the statistics of its children. It is not strictly necessary to construct recurrence plots 
for a variety of embedding parameters, because the key statistics that we have considered are all 
contained within this parent plot, and many of these are directly derivable from the distribution 
of diagonal line lengths. This demonstrates clearly the effect of embedding on recurrence plots. 

A further result is that the mutual information between two timeseries can be obtained from their 
recurrence plots, and is related to counting the number of shared black dots. Similar comparisons 
of unthresholded recurrence plots yield the mutual information as a function of the threshold 
parameter e. This allows time-localized contributions to the mutual information to be assessed 
and quantified, as we have shown for the example of geomagnetic indices. 

Comparisons between repeated patterns in signals from nonlinear systems are particularly valu- 
able when the systems in question are spatially extended and evolve in a nonstationary fashion. 
Macroscopic plasmas, whether naturally occuring or created in fusion experiments, often fall into 
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this category, which presents a substantial challenge to the techniques of statistical and time se- 
ries analysis; see, for example, the discussions in recent studies of astrophvsical|29| . solar |30|. and 
fusion |ST] plasma observations. The successful application of recurrence plots and the concepts of 
information theory to the geomagnetic plasma timeseries studied in the present paper is, therefore, 
encouraging. 
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